758 research outputs found

    Investigation on the AP-42 sampling method

    Full text link
    The Las Vegas area has been designated by the U.S. EPA as a serious PM 10 non-attainment area. To monitor PM10 in this area, dust data have been collected quarterly using the AP-42 method. According to this method, the number of composite sample sizes (the number of sample sites) needs to be determined first. In the actual dust data collection at each of these sample sites, a procedure with the specifications of the number and locations of incremental samples (plots) and their sizes (i.e., length) has to be followed. Apparently, there has been no rule existing that can be used to determine the composite sample size. In addition, it is unknown whether the required number of plots and their sizes are validated based on real data; Due to the availability of dust emission data collected using mobile sampling technologies, which are viewed as being close to actual continuous dust emission data over a roadway segment, this study investigates the optimal number of sample sites and number of plots and their sizes that can be used for the AP-42 method. To determine the number of sample sites, the optimal allocation sampling method is adopted. By using this method, the variance of emission estimated based on samples can be minimized for a fixed budget. The issue with validating the number of plots and their sizes for the AP-42 method is investigated by using the Monte Carlo simulation method. In the simulation, the layouts of plots are emulated following the AP-42 method. The difference between the estimated emission factor based on the emulated AP-42 method and the true emission factor are compared. Patterns for the difference between the estimated and true emission factors versus the number and size of plots are observed. These observed patterns are used to derive the thresholds of the number and size of plots for the AP-42 method; The results from the optimal allocation method indicate that most sample sites should be allocated to the local roads because the variance of emission and proportion of roadway segments of this roadway classification are significantly higher than most of other roadway classifications. This conclusion may lead to the development of more cost effective sampling approaches. The results from the Monte Carlo simulation method imply that clear patterns of improved estimation of emission factors versus the number and size of plots can be observed only for three roadway classifications, not for other classifications. This result may indicate that the AP-42 method may not be applicable to some roadway classifications, and thus a different data collection method, such as the mobile sampling technologies, may be necessary

    The Devil of Face Recognition is in the Noise

    Full text link
    The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.Comment: accepted to ECCV'1

    Disentangled Causal Graph Learning forOnline Unsupervised Root Cause Analysis

    Full text link
    The task of root cause analysis (RCA) is to identify the root causes of system faults/failures by analyzing system monitoring data. Efficient RCA can greatly accelerate system failure recovery and mitigate system damages or financial losses. However, previous research has mostly focused on developing offline RCA algorithms, which often require manually initiating the RCA process, a significant amount of time and data to train a robust model, and then being retrained from scratch for a new system fault. In this paper, we propose CORAL, a novel online RCA framework that can automatically trigger the RCA process and incrementally update the RCA model. CORAL consists of Trigger Point Detection, Incremental Disentangled Causal Graph Learning, and Network Propagation-based Root Cause Localization. The Trigger Point Detection component aims to detect system state transitions automatically and in near-real-time. To achieve this, we develop an online trigger point detection approach based on multivariate singular spectrum analysis and cumulative sum statistics. To efficiently update the RCA model, we propose an incremental disentangled causal graph learning approach to decouple the state-invariant and state-dependent information. After that, CORAL applies a random walk with restarts to the updated causal graph to accurately identify root causes. The online RCA process terminates when the causal graph and the generated root cause list converge. Extensive experiments on three real-world datasets with case studies demonstrate the effectiveness and superiority of the proposed framework

    Preliminary analysis of PGRP-LC gene and structure characteristics in bumblebees

    Get PDF
    PGRP-LC is a significant pattern recognition receptor of the insect innate immune system that can recognize peptidoglycans and activate immune signaling pathways regulating the expression and release of antimicrobial peptides against infection. We for the first time analyzed the phylogenetic tree, purification and structure of bumblebee PGRP-LC. The results showed high conservation of bumblebee PGRP-LC among the 16 bumblebee species, and further phylogenetic analysis showed that the PGRP-LC phylogeny of different subgenera (Subterraneobombus, Megabombus, Melanobombus, Bombus) is consistent with that of the COI gene. Additionally, the phylogeny of PGRP-LCs among Bombus, Apis and the solitary bee Megachile rotundata coincides with the sociality evolution of bees. Moreover, bumblebee PGRP-LC (Bl-PGRP-LC) shares the Drosophila PGRP-LCx and PGRP-LCa topology, retaining conserved disulfide bonds and 80% binding residues involved in the interaction between TCT and PGRP-LCx. Therefore, Bl-PGRP-LC might share some similar binding characteristics with Drosophila PGRP-LCx. In addition, Bl-PGRP-LC has shorter β5 and β1 sheets, longer β2, β3, and β4 sheets and a shallow binding groove. To determine the characteristics of Bl-PGRP-LC, high-purity PGRP-LC inclusion bodies, soluble GST-tag Bl-PGRP-LC fusion protein and soluble pure Bl-PGRP-LC were obtained in vitro. The results will be helpful for further study of the function and structure of Bl-PGRP-LC

    Efficient Commercial Bank Customer Credit Risk Assessment Based on LightGBM and Feature Engineering

    Full text link
    Effective control of credit risk is a key link in the steady operation of commercial banks. This paper is mainly based on the customer information dataset of a foreign commercial bank in Kaggle, and we use LightGBM algorithm to build a classifier to classify customers, to help the bank judge the possibility of customer credit default. This paper mainly deals with characteristic engineering, such as missing value processing, coding, imbalanced samples, etc., which greatly improves the machine learning effect. The main innovation of this paper is to construct new feature attributes on the basis of the original dataset so that the accuracy of the classifier reaches 0.734, and the AUC reaches 0.772, which is more than many classifiers based on the same dataset. The model can provide some reference for commercial banks' credit granting, and also provide some feature processing ideas for other similar studies

    Traditional Chinese Mind and Body Exercises for Promoting Balance Ability of Old Adults: A Systematic Review and Meta-Analysis

    Get PDF
    The purpose of this study was to provide a quantitative evaluation of the effectiveness of traditional Chinese mind and body exercises in promoting balance ability for old adults. The eligible studies were extensively searched from electronic databases (Medline, CINAHL, SportDicus, and Web of Science) until 10 May 2016. Reference lists of relevant publications were screened for future hits. The trials used randomized controlled approaches to compare the effects of traditional Chinese mind and body exercise (TCMBE) on balance ability of old adults that were included. The synthesized results of Berg Balance Scale (BBS), Timed Up and Go Test (TUG), and static balance with 95% confidence intervals were counted under a random-effects model. Ten studies were selected based on the inclusion criteria, and a total of 1,798 participants were involved in this review. The results of the meta-analysis showed that TCMBE had no significant improvement on BBS and TUG, but the BBS and TUG could be obviously improved by prolonging the intervention time. In addition, the results showed that TCMBE could significantly improve the static balance compared to control group. In conclusion, old adults who practiced TCMBE with the time not less than 150 minutes per week for more than 15 weeks could promote the balance ability
    corecore